Combination of Machine Learning Methods for Optimum Chinese Word Segmentation
نویسندگان
چکیده
This article presents our recent work for participation in the Second International Chinese Word Segmentation Bakeoff. Our system performs two procedures: Out-ofvocabulary extraction and word segmentation. We compose three out-of-vocabulary extraction modules: Character-based tagging with different classifiers – maximum entropy, support vector machines, and conditional random fields. We also compose three word segmentation modules – character-based tagging by maximum entropy classifier, maximum entropy markov model, and conditional random fields. All modules are based on previously proposed methods. We submitted three systems which are different combination of the modules.
منابع مشابه
A Simple and Effective Closed Test for Chinese Word Segmentation Based on Sequence Labeling
In many Chinese text processing tasks, Chinese word segmentation is a vital and required step. Various methods have been proposed to address this problem using machine learning algorithm in previous studies. In order to achieve high performance, many studies used external resources and combined with various machine learning algorithms to help segmentation. The goal of this paper is to construct...
متن کاملExperimental Comparison of Discriminative Learning Approaches for Chinese Word Segmentation
Natural language processing tasks assume that the input is tokenized into individual words. In languages like Chinese, however, such tokens are not available in the written form. This thesis explores the use of machine learning to segment Chinese sentences into word tokens. We conduct a detailed experimental comparison between various methods for word segmentation. We have built two Chinese wor...
متن کاملText Window Denoising Autoencoder: Building Deep Architecture for Chinese Word Segmentation
Deep learning is the new frontier of machine learning research, which has led to many recent breakthroughs in English natural language processing. However, there are inherent differences between Chinese and English, and little work has been done to apply deep learning techniques to Chinese natural language processing. In this paper, we propose a deep neural network model: text window denoising ...
متن کاملA domain adaption Word Segmenter For Sighan Backoff 2010
We present a Chinese word segmentation system which ran on the closed track of the simplified Chinese Word Segmentation task of CIPS-SIGHAN-CLP 2010 bakeoffs. Our segmenter was built using a HMM. To fulfill the cross-domain segmentation task, we use semi-supervised machine learning method to get the HMM model. Finally we get the mean result of four domains: P=0.719, R=0.72
متن کاملA Long Dependency Aware Deep Architecture for Joint Chinese Word Segmentation and POS Tagging
Long-term context is crucial to joint Chinese word segmentation and POS tagging (S&T) task. However, most of machine learning based methods extract features from a window of characters. Due to the limitation of window size, these methods can not exploit the long distance information. In this work, we propose a long dependency aware deep architecture for joint S&T task. Specifically, to simulate...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005